On the incompleteness of the moment and correlation function 
hierarchy as probes of the lognormal field. 
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ABSTRACT 

We trace with analytical methods and in a model parameter independent manner the inde- 
pendent bits of Fisher information of each of the moments of the lognormal distribution, as a now 
standard prescription for the distribution of the cosmological matter density field, as it departs 
from Gaussian initial conditions. We show that, when entering the regime of large fluctuations, 
only a tiny, dramatically decaying fraction of the total information content remains accessible 
through the extraction of the full series of moments of the field. This is due to a known pecu- 
liarity of highly tailed distributions, that they cannot be uniquely recovered given the values of 
all of their moments. This renders under this lognormal assumption cosmological probes such as 
the correlation function hierarchy or equivalently their Fourier transforms fundamentally limited 
once the field becomes non linear, for any parameter of interest. We show that the fraction of the 
information accessible from two-point correlations decays to zero following the inverse squared 
variance of the field. We discuss what general properties of a random field's probability den- 
sity function are making the correlation function hierarchy an efficient or inefficient, complete or 
incomplete set of probes of any model parameter. 

Subject headings: cosmology: theory large-scale structure of universe 



1. Introduction 

The cosmological matter density field is becom- 
ing more and more directly access ible to observa- 
tions with the help of weak lensing (ISchneider et al 



Tegmark 1997; Hu & TeemarkI 1999; Hu & Jain 


2004 


; Amara & Refregiei' '2007": 'Parkinson et al. 


2007 


; Albrecht et al. 


,2006; Bernstein 2009^ e.g.). 



200a iMunshi et al.ll2006l) . 

erties are the key 



T99I iBartelmann fc Schneideil I2001I: iRefregied 



Its statistical prop- 
element in trying to opti- 
mize future large galaxy surveys aimed at an- 
swering actual fundamental cosmological ques- 
tions, such as th e nature of the dark components 
of the universe ( Caldwell fc Kamionkowski 2009t 



Frieman et al.ll2008[ ). To this aim, Fisher's mea- 
sure of in f ormation on param eters ( Fisheil [19251 : 
Raol 1973 ; van den Bod 2007) has naturally be- 
come of standard use in cosmology. It provides 
indeed an handy framework, in which it is possi- 
ble to evaluate in a quantitative manner the sta- 
tistical power of some exp eriment configuration 
aimed at some observable ( Tegmark et al.l Il997 ; 



Such studies are in the vast majority of cases lim- 
ited to Gaussian probability density functions, or 
perturbations therefrom, and deal mostly with the 
prominent members of t he correlation function 
hierarchy (jPeebled Il980h . or equivalently their 
Fourier transforms the polyspectra, such as the 
matter power spectrum. 

The approach via the correlation function hierar- 
chy is very sensible in the nearly linear regime for 
at least two reasons. First, in principle, the cor- 
relations are the very elements that cosmological 
perturbation theor y is able to predict in a s ystem- 



(see ( Bernardeau et al. 20021) fo 



3r a 



atic manner 

review, or the more recent ( Matsubarall201l lh and 



the numerous references in it). Second, primor- 
dial cosmological fluctuations fields are believed 
to be accurately described by the use of Gaussian 
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statistics. It is well known that the correlations 
at the two-point level provide a complete descrip- 
tion of Gaussian fields. It is therefore natural to 
expect this approach to be adequate throughout 
the linear and the mildly non linear regime, when 
departures from Gaussianity are small. 
Deeper in the non linear regime, fluctuations 
grow substantially in size, and tails in the matter 
probability density function do form. A stan- 
dard prescription for the statistics of the matter 
field in these conditions is the lognormal dis- 
tribution, various properties of which are dis- 
cussed in details in an astrophysical context in 
( Coles fc Jones! 199l[ ). It was later shown to be 
reproduced accurately, both from the observa- 
tional point of view as well as in comparison to 
standa rd pertu rbation theory and N-body simula- 



tions (iBernardea u & Kofmaii 1995; Bernardca^ 
19941: iKavo et all I2OOII : iTavlor fc Watts! |2003 



Wild et all l2005l) . in low dimensional settings. 



More recently, it was used as a starting point 
for a tentative of a better description of clus- 
tering (|Kitaural[2010l ). The lognormal assump- 
tion is also very much comp a tible with numerical 
works ( Nevrinck et al. 20091 2011) showing that 
the spectrum of logarithm of the field In 1 -|- i5 car- 
ries much more information than the spectrum of 
5 itself. The first evaluation of the former within 
the fram ework of perturb ation theory appeared 
recently (IWang et al.ll201ll). 



Lognormal statistics (jAitchison fc Brownl 11957 , 
for a textbook presentation) are not innocuous. 
More specifically, the lognormal distribution is 
only one among many distributions that leads to 
the very same series of moments. This fact in- 
dicates that, going from the distribution to the 
moments, one may be losing information in some 
way or another. A fundamental limitation of the 
correlation function hierarchy in extracting the 
information content of the field in the non lin- 
ear regime could therefore exist, if its statistics 
are indeed similar to the lognormal. This impor- 
tant fact was already mentioned qualitatively in 
(jColes fc Joneslll99l[) . but it seems no quantita- 
tive analysis is available at present. 
It is the purpose of this paper to provide first 
answers to these issues, in terms of Fisher infor- 
mation, looking at the details of the structure of 
the information within the lognormal field. It is 
built out of two main parts. 



The first deals exclusively with the case of a single 
lognormal variable, illustrating the main aspects 
we want to point out in this work. We begin by 
presenting how to identify the independent bits 
of information that are contained in the succes- 
sive moments of a distribution, with the help of 
orthogonal polynomials. We discuss the proper- 
ties of this decomposition that are relevant for 
our purposes. The procedure is very similar t o 
the decompositions presented in (jjarretti Il984[) , 
to which we refer for a more complete discussion 
on the properties of such expansions. In a sec- 
ond step, we perform this decomposition for the 
lognormal distribution, which can be obtained ex- 
actly at all order in terms of g-series, due to its 
convenient analytical properties. It contains one 
of our main results, presented in figure [T] : when 
the variance of the fluctuations of the lognormal 
distribution reaches unity, essentially the entirety 
of its information content cannot be accessed 
anymore through a study of its moments, even 
if the complete series of moments could be ex- 
tracted. We then delve a little bit more into the 
details of that phenomenon, and show that this 
is due to the inability of polynomials to repro- 
duce the logarithm function, leading to missing 
bits of information in the part of the distribution 
describing the underdense regions. Finally, we 
perform a comparison of our r esults for the lower 
order moments with results ( Bernardeaul Il994 ) 
from standard perturbation theory, and find good 
agreement over several orders of magnitude in the 
variance. 



The second part extends the analysis to the multi- 
variate lognormal distribution, and to the contin- 
uous limit of the lognormal field. The decompo- 
sition of the information into uncorrelated pieces 
is conceptually identical. Due to the highly in- 
creased formal complexity, the explicit expres- 
sions for the information content of the n-points 
correlations are however of less practical use. We 
therefore focus on two simpler situations that can 
be dealt with analytically, dealing with the ex- 
traction of the mean of a lognormal field and two 
point correlations. 



^Throughout this work, by 'information' is meant more rig- 
orously Fisher information. 
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We then summarise the resuUs, and conclude 
with a discussion on the conditions for the cor- 
relation function hierarchy to form a complete or 
incomplete, efficient or inefficient set of probes of 
a random field. The appendix contains a short 
series of technical details regarding the material 
presented in the main text. 



2. One variable 

For a variable X, with probability density func- 
tion p{x,a), a any parameter, we are assuming 
that all moments exist, write for them 

m„ = (x"), n = 0,l,--- (1) 

and the associated covariance matrix with 



= rrii+j - miTTij 



(2) 



Since p{x, a) is normalised to unity for any value 
of the parameter, we have 



duiQ 
da 



(3) 



where s{x,a) = dalap^Xja) is the scor e func- 
tion. Fisher's measure of information ([Fisher 
1925 : Ivan den Bod [2OO7I) on the parameter a is 
then defined as the variance of the score function 



(s^(x,a)) 



The Fisher information density 



s'^{x, a)p{x, a)dx 



{dap{x,a)y 
p{x,a) 



■dx 



(4) 



(5) 



is the amount of information associated to ob- 
servations of realizations of the variable in the 
range {x,x + dx). Fa it self has got ten through the 
Cramer- Rao inequality ( Raolll973 ) the widespread 
interpretation in cosmology of approximating the 
error bars the experiment under consideration will 



be ab le to put on the parameter a (jTegmark et al 
I997L e.g.). 



2.1. 



Fisher information and orthogonal 
polynomials 



The decomposition of the information in inde- 
pendent pieces associated to each moment relies on 
the approximation of the score function through 



orthonormal polynomials. For each natural num- 
ber n, Pn is a polynomial of degree n and 



{Pm{x)Pn{x)) = Sr, 



(6) 



These polynomials can always be constructed for 
a given distribution and are unique up to a sign, 
which is fixed by requiring the coefficient of x"^ 
i n P„, to be positive. W e refer to the textbooks 



(jSzego l2003l:lFreudl[l97lh :br the general theory oi' 
orthogonal polynomials. These polynomials can 
be written in the monomial basis with the help of 
a triangular transition matrix C that we will use 
later on. 



Pn{x) 



m=0 



Cn 



(7) 



According to equation it holds that the non 
constant orthogonal polynomials average to zero. 
As the value of the model parameter changes, 
these averages take non vanishing values, at a rate 
which is equal to the component of the score func- 
tion parallel to these polynomials : 



d{Pn{x)) 

da 



{s{x,a)Pn{x)) 



(8) 



where the relation dap{x,a) = s{x,a)p{x,a) was 
used. We argue that is precisely the indepen- 
dent information content of the moment of order 
n. This can be seen as the following. For any nat- 
ural number n, it is not difficult to show that the 
inverse covariance matrix of size n is given by 



hi = 1, ' 



(9) 



Therefore, noting that from equation ([T]) and from 
the definition ([8]) of the information coefficients s„ 
we can write 



da ' 

fc=i 



the following relation holds. 



E 



drrii 
da 



da 



(10) 



(11) 



This expression, weighting the sensitivity of the 
moments to the parameter by the covariance ma- 
trix, is the amount of information present in the 
first n moments, taking all correlations into ac- 
count. For instance, this is exactly the amount 
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of information available, if the first n moments 
were to be extracted, with the he lp of unbiased 



Gaus sian distributed estimators (jTegmark et al 



19971 . e.g.). Also, it is simple to show that these 
coefficients are invariant under any linear trans- 
formation of the field. This allow us to identify 
unambiguously the nth squared coefficient as the 
independent bits of information contained in the 
moment of order n. 

If the set of orthonormal polynomials forms a 
complete basis, the partial sums 



N 

E 



■^nPn (^) 



(12) 



will tend, with increasing iV, to reproduce accu- 
rately the score function. By Parseval identity, the 
full amount of Fisher information can be written 



as 



(13) 



This last equation implies that the information 
contained in the full set of moments is identical to 
the total amount of information. This is certainly 
in perfect agreement with ex pectations. A well 
known result due to M. Riesz ( Riesz 1923[ ) in the 
theory of moments states namely that if the mo- 
ment problem associated to the moments {w^j.^g 
is determinate, (i.e. the distribution giving rise 
to these moments is uniquely determined by their 
values), then the set of associated orthonormal 
polynomials is complete. Since the distribution is 
uniquely determined, common sense would require 
then the total amount of information contained in 
the distribution to be the same as the one con- 
tained in the full set of moments. 

However, moment problems are not always deter- 
minate, and orthonormal polynomials associated 
to weight functions do not always form complete 
sets. Therefore, the series 



(14) 



may not always converge to the total amount of 
Fisher information, and, if not, will always under- 
estimate it. We have namely, instead of Parseval's 



identity, the Bessel inequality. 



< ( ^ SnPn{.x) - s{x, o) 



fa. 



(15) 



In words, the mean squared error in approximat- 
ing the score function with polynomials is the 
amount of Fisher information absent from the full 
set of moments. As e mphasized already in a as- 
trophysical context by ( Coles fc Jones! 199l h and 
stated in our introduction the moments of the log- 
normal distribution are precisely an example of an 
indeterminate moment problem. In fact, a whole 
famil y of distribution, given explicitly in (jHevde 
19631 ). do have the very same series of moments. 
In light of these considerations, our subsequent re- 
sults cannot be considered as surprising. 
Before turning to the actual calculation of the co- 
efficients Sn of the lognormal distribution, let us 
just state that when the score function is itself a 
polynomial, it is clear that the series ([T2|) actually 
terminates, 

Sfc=0, k>n (16) 

where n is the order of the polynomial representing 
the score function. The prime example being the 
Gaussian distribution, for which n = 2, with asso- 
ciated orthonormal polynomials the Hermite poly- 
nomials. The well known fact that the mean and 
the variance of the Gaussian distribution carry all 
of the information becomes within our framework 
that only the coefficients si and S2 are non-zero. 

2.2. Basic properties of the lognormal dis- 
tribution 

The variable X has a lognormal distribution 
when y = In A" is a normal variable, with mean 
and variance Gy ■ The dependency on a model 
parameter a can enter one or both of these param- 
eters. The moments of X are given by Gaussian 
integrals and read explicitly 



1 



exp rt/iy 



(17) 



The mean and variance of Y relate therefore to 
the mean /i and variance cr^ of X according to 



4 



My 



: ln(l 
In /i - 



-Infl 



(18) 
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where is the variance of the fluctuations of X, 



(19) 



Since Y is has a Gaussian distribution and Fisher's 
measure is invariant under invertible transforma- 
tions, the total Fisher information content of X is 
given by the well known expression for the Gaus- 
sian distribution, 



1 / d^Y 



1 



24 



da 



(20) 



The key parameter throughout this part of this 
work will be the quantity q, defined as 



1 



1 



(21) 



Note that q is strictly positive and smaller than 
unity. The regime of small fluctuations, where the 
lognormal distribution is very close to the Gaus- 
sian distribution is described by values of q close 
to unity. Deep in the non linear regime, it tends 
to zero. These two regimes are conveniently sepa- 
rated at q = 1/2, corresponding to fluctuations of 
unit variance. We note the following convenient 
property of the moments for further reference. 



it+j = niimj q 



2.3. Information coefficients 



(22) 



From equations (fTO|) , (|17p , and (ITSl) , we see that 
the n-th information coefficient s„ is given by 



din ji 
da 



Cnk ruk k 



fc=0 

2(1 + ^2) da 



^ Cnk ruk k{k - 1). 



(23) 



fc=0 



Evaluation of the above sums can proceed in dif- 
ferent ways. Notably, it is possible to get an ex- 
plicit formula for the orthonormal polynomials, 
and therefore of the matrix C, for the lognormal 
distribution. These are es sentially the St ieltjcs- 
Wigert polynomials (|Wigert, ,1923c ISzegol 12003). 
We will namely use their specific form later in this 
work, though they are not needed for the purpose 
of evaluating 1^^. We proceed with the following 
trick : we introduce the g-shifted factorial , also 



called q-Pochammer symbol ( Kac &: Cheung|[2001 
Andrews et al. 19991 . section 10), as 



fe=0 



t a real number, and prove in the appendix that 
the following curious identity holds. 



(P„(te)) = (-1)" 



(t : q), 



By virtue of 



(P„(te)) =^C„fc mfc 



(25) 



(26) 



fc=0 



it follows from our identity (P5|) that the sums 
given in the right hand side of equation (|23)) are 
proportional to the first, respectively the second 
derivative of the g— Pochammer symbol evaluated 
at i = 1. Besides, matching the powers of t on 
both sides of equation (j25p will provide us im- 
mediately the explicit expression for the matrix 
elements Cnk- 



We distinguish explicitly two situations, labelled 
by an index a taking values /i or cr, where only 
one of the two parameters of the lognormal distri- 
bution actually depends on a. The general case 
being reconstructed trivially from these two. 

case a = n We assume in this case that the pa- 
rameter enters the mean of the distribution only, 



da 



= 



(27) 



From (f23| . we see that the derivative of fi with 
respect to a only plays the role of an overall nor- 
malization constant. Since we will deal exclusively 
with ratios, it is irrelevant for our purposes. We 
choose for convenience 



9 In/i 

da 



= 1. 



(28) 



The total amount of information in the distribu- 
tion becomes, from (PH)) and ^TE\\ . 



:= 



1 



ln(l 



(29) 
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case a — a The parameter enters the variance 
of the distribution only, and we pick again a con- 
venient normahzation of its derivative 



1 



2 (1 + ^2) da 



= 1, 



d In iL 

da 



= 0. 



(30) 



This situation is the most common in cosmology, 
for instance for any model parameter entering the 
matter power spectrum. The exact amount of in- 
formation becomes, again from (pHl) and (IT^ . 



1 



(31) 



In both of these situations, we obtain the infor- 
mation coefficients by differentiating once, 
respectively twice, our relation (^5]) with respect 
to the parameter and evaluating these deriva- 
tives at i = 1. The result is, in the a — fj, case. 



< = i-^r-'^Y^ii--<i)n-i (32) 

and for a = a, 



.fc=i 



n> 1 



(33) 



whereas s^^^ is easily seen to vanish from its def- 
inition. 



2.4. Incompleteness of the information in 
the moments 

The series 



(34) 



are the total amount of information contained in 
the full series of moments, in the respective cases 
described above. The ratios and e^,, defined as 



(35) 



are the fraction of the information that can be 
accessed by extraction of the full set of moments 
of X. The two asymptotic regimes of very small 



and very large fluctuation variance as can be seen 
without difficulty. In both cases, it is seen that the 
first non vanishing term of the corresponding se- 
ries dominates completely its value. For very small 
variance, or equivalently q very close to unity, ta 
tends to unity, illustrating the fact the distribu- 
tion becomes arbitrary close to Gaussian : all 
the information is contained in the first two mo- 
ments. The large variance regime is more interest- 
ing, and, even tough the information coefficients 
decays very sharply as well, the series (|34l) are far 
from converging to the corresponding expressions 
([29ll and (|3T1) showing the total amount of infor- 
mation. Considering only the dominant first term 
in the relevant series and setting g — 0, one ob- 
tains 



1 



and a much more dramatic decay of e„ '■ 



(7 X 



(36) 



(37) 



Both series given in p4[) are quickly convergent 
and well suited for numerical evaluation. Figure 
[T] shows the accessible fractions ea of information 
through extraction of the full series moments. Fig- 
ure m shows the repartition of this accessible frac- 
tion among the first 10 moments. Most relevant 
from a cosmological point of view in figure [1] is the 
solid line, dealing with the case of the parameters 
of interest entering the variance only. These fig- 
ures shows clearly that the moments, as probes 
of the lognormal matter field, are penalized by 
two different processes. First, as soon as the field 
shows non-linear features, following equations p6p 
and dST]) . almost the entirety of the information 
content cannot be accessed anymore by extract- 
ing its successive moments. Within a range of one 
magnitude in the variance, the moments goes from 
very efficient probes to highly inefficient. Second, 
as shown in figure [U as the variance of the field ap- 
proaches unity, this accessible fraction gets quickly 
transferred from the variance alone to higher or- 
der moments. This repartition of the information 
within the moments is built out of two different 
regimes. First, for large variance, or large n, we 
see easily from the above expressions (|32t and (p3)) 
that in both cases the information coefficients de- 
cays exponentially. 



si oc 



(1 + ^l) 



—n 



n 



ln(l + a^)>l. (38) 
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On the other hand, if the variance or n is small 
enough, we can set w — nlng, and we obtain, 

very roughly. 



oc \n 



an(l + a|) < 1, (39) 



explaining the trend with variance seen in figure 
[21 that puts more importance to higher order mo- 
ments as the variance grows. Note that the latter 
regime can occur only for small enough values of 
the variance. Deeper in the non linear regime, the 
trend is therefore reversed, obeying ([55]) for all val- 
ues of n, with a steeper decay for higher variance. 

2.5. A g-analog of the logarithm 

These results show clearly that large parts of 
the information become invisible to the moments. 
However, it does not tell us what is responsible 
for this phenomenon. It is therefore of interest to 
look in a little bit more into the details of this 
missing pieces of information. As we have seen, 
these are due to the inability of the polynomials 
to reconstruct precisely the score function. In the 
case a = /i, the score function of the lognormal 
distribution is easily shown to take the form of a 
logarithm in base q, 



s(x) = -i-ln,(^ 



Therefore the series 

oo 
n=0 



(40) 



(41) 



will represent some function, very close to a loga- 
rithm for q ^ 1 over the range of p(a;,a). It will 
however fail to reproduce some of its features at 
lower g- values. This is hardly surprising, since it 
is well known that the logarithm function does not 
have a Taylor expansion over the full positive axis. 
For this reason, the approximation s^{x) of s{x) 
through polynomials can indeed only fail when the 
fluctuation variance becomes large enough. In the 
appendix, we show that 5^(2;) takes the form 



y'(x) 



^ 1 

fc=i 



l + (-l) 



fc(A;-l) 



(9 : <l)k VM 



(42) 

It is interesting to note that this series expansion 
is almost identical to the one of the g-analog of 




Fig. 1. — The fraction of the total information 
content that is accessible through extraction of the 
full series of moments of the lognormal field, as 
function of the square root of the variance of the 
fluctuations. 







Hj = 


0,1 








A = 


0,3 








aa, = 


0.8 






















'a. 





Fig. 2. — The distribution of the information 
within the first 10 moments of the lognormal field, 
given by the coefficients (s^^)^, equation (p3)) . nor- 
malized to the information content of the second 
moment, for three different values of as. Note that 
deeper in the non linear regime, the trend is re- 
versed. 
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the logarithm Sq{x) defined by E. Koelink and 
W. Van Assche, with the only difference being 



the replac ement of q 



fe(fe-l)/2 



by q 



k(k-l) 



(See (?), 



and also ( Gautschi 20081 )). Due to this replace- 
ment, s'^ does not possess several properties Sq 
has and makes it a real g-analog of the logarithm, 
such as Sq{q~^) — n, for positive integers. The 
qualitative behavior of stays however close to 
Sq. Notably, its behavior in underdense regions, 
x/fj, ^ 1, where as seen from s'' tends to a 
finite value, is very different from a logarithm. 

This calculation can be performed as well in the 
case a = a, with similar conclusions. Since it is 
rather tedious and not very enlightening, we do 
not reproduce it in these pages. We show in fig- 
ure [3] the information density, equation ([5]), of the 
lognormal distribution (dashed line), and its ap- 
proximation by the orthogonal polynomials (solid 
line) , 



(43) 



when the fluctuation variance cr| is equal to unity. 
It is clear from this figure that in this regime, while 
most of information is located within the under- 
dense regions of the lognormal field, the moments 
are however unable to catch it. 
To check the correctness of our numerical and an- 
alytical calculations, we compared the total infor- 
mation content as evaluated from integrating the 
information densities on figure [3] to the one given 
by the equation (|3T|) . respectively (|34l) . with es- 
sentially perfect agreement. 

2.6. Comparison to standard perturbation 
theory 

For any distribution, the knowledge of its first 
2n moments allow directly, for instance from equa- 
tion pip , the evaluation of the independent infor- 
mation content of the first n moments. This even 
if the exact shape of the distribution is not known, 
or too complicated. In particular, we can use the 
explicit expressions for the first six moments of 
the density fiuctuation field within the framework 
of standard perturba tion theory (SPT) provided 
by F. Bernardeau in ( Bernardeaul 1994 ). in order 
to compare S2 and S3 as given from SPT to their 
lognormal analogs. 



We note that a comparison to (iBernardearJ 1 1 9941 ) 
can only be very incomplete and, to some extent, 
it c an only fail. It is indeed part of the approach 
in ([Bernardeaul 1 1 9941 ) . when producing functional 
forms for the distribution of the fluctuation field, 
to invert the relation between a moment generat- 
ing function and its probability density function. 
For such an inversion to be possible it is of course 
necessary that the probability density is uniquely 
determined by its moments. As said, this is not 
the case for the lognormal distribution. There- 
fore, that approach can never lead to an exact 
lognormal distribution, or to any distribution for 
which the moment hierarchy forms an incomplete 
set of probes. However, such a comparison can 
still lead to conclusions relevant for many practi- 
cal purposes, such as those dealing with the first 
few moments. 



The variance of the field is explicitly given as 
an integral over the matter power spectrum. 



1 

2^ 



dk k^P{k,a) \W{kR)\ 



(44) 



where W{kR) is the Fourier transform of the real 
space top hat filter of size R, and any cosmological 
parameter a enterin g the power spect rum P{k). 
In the notation of ( Bernardeau Il994l) . the mo- 
ments of the fiuctuation field m„ = (^") are given 
by the deconnected, or Gaussian, components, 
while the connected components (<5")^, n > 3 
are given in terms of parameters Sn, 



(45) 



The parameters Sn contain a leading, scale inde- 
pendent coefficient, and deviation from this scale 
independence are given in terms of the logarithmic 
derivative of the variance, 



li 



d Ini?'' 



(46) 



Neglecting the very weak dependence of S'„ on cos- 
mology, from (j45p we can write 



dnin 
da 



M 

da 




n ■■ 
n 

n ■ 



(47) 



With the coefficien ts S'„ up to n = 6 given in 
(jBernardeaul Il994 page 703), and the above 
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relations, we performed a straightforward eval- 
uation of the information coefBcients s| and 
s§, using equation The variance was ob- 

tained from within a flat ACDM universe 

{Qa = 0.7, flm = 0.3, fif, = 0.045, /i = 0.7), with 
power spectrum parameters {as = 0.8, n — 1) and 
we used the transfer functio n from Eisenstein and 
Hu ([Eisenstein fc HulllQQSl ). The needed deriva- 
tives 7i, i = 1, • • • ,4 were obtained numerically 
through finite differences. 



In figure m we show the ratio 



(48) 



i.e. the relative importance of the third moment 
with respect to the second, as function of the vari- 
ance, both for the lognormal distribution and the 
SPT predictions. This ratio is identically zero for a 
Gaussian distribution. The models stands in good 
agreement over many orders of magnitude. It is 
striking that both models consistently predict that 
a the entrance of the non-linear regime, this ratio 
takes a maximal value close to unity. Surely, the 
SPT curve for larger values of the variance is hard 
to interpret, since out of its domain of validity. 




og,„ X / /i 



Fig. 3. — The information density of the lognor- 
mal distribution, dashed, and, solid, its approxi- 
mation through the associated orthonormal poly- 
nomials, in the a = a case, for fluctuations of unit 
variance. While most of the information of the log- 
normal field in this regime is actually contained in 
the underdense regions, the moments are essen- 
tially unable to catch it. 



3. Several variables 

So far, we have considered only one variable, 
and wish now to extend our analysis to the more 
interesting multidimensional case. It is possible to 
derive formally a general expression for the inde- 
pendent information content of the rt-point corre- 
lations of any distribution. This proceeds in strict 
analogy with the one-dimensional case, where an 
expansion of the score function in polynomials of 
several variables is made. It is presented in some 
more details in the appendix. For the lognormal 
field, a given model parameter can only enter via 
the means of the logarithm of the field at each 
point or through the elements of its two-point cor- 
relation matrix. We could however not transform 
the corresponding expressions in a useful, easily 
evaluated form in a general situation, like in the 
one dimensional case. We focus therefore on two 
more restricted but tractable situations. First, in 
analogy with the a — ^jl case, we consider a param- 
eter that enters the mean of the field p and no el- 
ements of the correlation matrix. In this case, the 




0.01 o.i:) i.o:) 10,00 

Fig. 4. — The ratio of the independent informa- 
tion content of the third moment to that of the 
second moment, for the lognormal field (dashed) 
and standard perturbation theory (solid), as func- 
tion of the square root of the variance of the fluc- 
tuations. 
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complete amount of information can be extracted 
via the mean of Inp, and compare that amount 
to the one obtained by extracting the mean of p 
only. In a second step, we consider the extraction 
of the correlation amplitude ^(r) between inde- 
pendent pairs of cells separated by that distance 
r. In this case, the full amount of information on 
any parameter impacting ^(r) is in the correlation 
of In p, and compare that to the more standard 
approach of the extraction of the correlation of p. 

3.1. Basic properties of the multivariate 
lognormal distribution 

The random vector p ~ (pi,--- ,pd) has a 
multivariate lognormal distribution if Inp := 
(In pi , • • • , In pd) is normally distributed, for some 
mean vector Inp and covariance matrix ^inp- Since 
we have in mind the vector p to be a sample of 
an homogeneous lognormal field, we will use the 
notation 



Pi = p{xi), i = 1, • • • ,d, 



(49) 



the points Xi lying in some n— dimensional space. 
The means Inpj are the same at each point, and 
the covariance matrix is a discrete version of the 
corresponding correlation function, 

[6np],y = ilnpiXi ~ Xj), ij = !,■■■ ,d (50) 

which depends only on the separation vector. For 
some vector of natural numbers (multiindex) j = 
(ji: ■ ■ ■ jjd), the correlations 

mi^{p^) = {pix^y^---pixdy-) (51) 

are again given by simple Gaussian integrals and 
read explicitly 



mj =exp ( Inp-j + ^ j^Cinp -j 



(52) 



By the independent information content of the 
n-point correlations we mean the independent bits 
of information within all the correlations of the 
same order n, that is within all mj such that 



d 

E 

1=1 



ji = n. (53) 
between the mo- 



The convenient property 
ments becomes 



M+j = mi ruj exp (i'^Cinp • j) m rrij Qy. (54) 



From these relations, one infers that the correla- 
tions of the fluctuations ^ of p, defined as 

^{xi - Xj) = ^ {{p{x,) - p) {p{xj) - p)) , (55) 
are related to those of In p through 

6„p(r)=ln [l + e(r)]. 
On the other hand, the means obey 

lnp = lnp- i6np(0) 



(56) 



Inp- iln(l + (T|) 



(57) 



The Fisher information content of p is again given 
by the standard expression for the Gaussian field 
In p. It splits into the part coming from the obser- 
vation of In p and the one coming from the corre- 
lations ^Inp, 



1. 



-Tr 



?lnp Sin 



da 



din P i dlnp 



'P da 
(58) 

We denote by Qn the square matrix defined as 



(59) 



which we will make future use of. We note that 
by virtue of equations ([M)) and ([56|). its matrix 
elements read 



n i'^+ii^k-xi)) 



(60) 



k.l=l 



A general expression for the independent informa- 
tion content of the correlations of order n, equa- 
tion (|C8|) . and its link to the completeness of the 
orthogonal polynomials is presented for complete- 
ness in the appendix. This machinery is however 
not compulsory for the following considerations, 
which are restricted to the two lowest order cor- 
relations. We will only use the analog of equation 
(|lip , which gives the total amount of information 
contained in the correlations up to order N, 



N 



N 

E 



da 



where S is the covariance matrix 



|iMj|<A^- 



(61) 



(62) 
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For the lognormal distribution, the property ((54)) 
allow us to rewrite this last expression in the 
equivalent form 



N 

E 



N 



sl = 



E91nr7ii r„_ii 91nmj 



iMj|=0 



da 



(63) 



3.2. Extraction of the mean 



In this case, we set 



da 



= 



(64) 



for any argument. Again, the actual value of the 
derivative of the mean with respect to a will play 
no role. This condition (IMl) implies that 9a^inp 
vanishes as well. We can read out from equation 
([55)1 that the total amount of information is given 

by 



only is given straightforwardly by the ratio of the 
power of the fields In p and p at zero argument. 



4 ^ PlnpjO) 

Fc. Pp(0) ■ 
From the explicit representation of -Pinp(O), 

Pinp(0)= / d"rln(l + ^(r)) 



(69) 



(70) 



and the fact that In (1 + a;) is strictly smaller than 
X whenever a; ^ 0, it follows that the loss of in- 
formation always occurs, but is substantial only if 
the correlation function takes substantial values. 
However the information loss e is roughly insensi- 
tive to the presence of some correlation scale. We 
see that the presence of correlations does not alter 
the main conclusions drawn in the first part of his 
work. 



dlnp 
da 



a 



(65) 



We stress that since In p is Gaussian, the informa- 
tion is accessible in its entirety by extraction of 
the mean of In p. On the other hand, the amount 
extracted by looking at the mean of p itself is equa- 
tion (|6T|) with iV = 1. Using the definition of ^ in 
equation (|55l). it becomes 



(66) 



.5 = (^)'i:[r'] 



da J ^ jjj 



In the limit of a continuous sample d -^■ cx), the 
sum 



(67) 



i.j=i 



becomes a double integral over space, which can 
be performed by Fourier transformation, and is 
the inverse of the power spectrum of the field at 
zero argument. 



E [n 



-3 ~^ Pp{k = 0) 



Pp(0)- / d-r^r) 
Jv 

and similarly for £,inp- We conclude that the loss 
of information by looking at the mean of the field 



3.3. Extraction of correlations 

We suppose now the parameter a enters the 
correlation function ^ for some argument r. Since 
the field is lognormal, measurement of ^inpif) cap- 
tures all the information on a. We want to com- 
pare this amount to the one extracted by measur- 
ing ^(r) itself. We suppose further that ^(r) is 
extracted from a number of independent pairs of 
points separated by that distance r. The indepen- 
dency of the pairs allow us to simplify drastically 
the problem, since by additivity of the informa- 
tion the information loss will be independent of 
the number of pairs. Our problem becomes thus 
two-dimensional. Our assumptions on the impact 
of the parameter a are, more explicitly. 



da 



daj 
da 



= 0, 



da 



^0. 



(71) 



We point out that this is very different from the 
a = a case that we treated earlier, since here the 
variance cr| only acts as a noise source and not as 
a source of information. The correlation matrix of 
a pair of points of the homogeneous Gaussian field 
In p separated by r reads, according to (I5(: 



6n< 



In (1 + ^2) ln(l-KC(r)) 
ln(l + e(r)) \n{l + al) 



(72) 



The positivity of the matrix constrains, at a fixed 
variance cr|, the values of ^(r) to the following 
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range 



(73) 



Clearly, vanishing correlations corresponds to rj = 
0, positive correlations to 77 > 0, and negative cor- 
relations to 77 < 0. By assumption, the parameter 
a does enter ^(r) only. Therefore, 



da 



1 + VI 



(74) 



Putting ([71]) in equation (|58p . we obtain that the 
total amount of information on a is 



1 



(75) 



To obtain the information obtained by extracting 
^, we first note that under our assumption (|7ip 
the mean carries no information, 



si = 0. 



(76) 



For this reason, s| is given by equation (|63p . with 
N = 2. The only non-zero element of the vector of 
derivatives da In mi, |i| < 2, is in our configuration 
([71]) for the multiindcx i = (1, 1). From equations 
(f52|) and ([Se]), we obtain 



(77) 



91nTOn 1) 9^(r) 1 



da 



da l + i{r) 



It follows immediately that S2 is given by 



si 



[Q 



2 ^](i,i)(i,i) ■ ^"^^^ 



Of special interest is the limit of low correlations, 
where the exact result 



Fa 



1 



\n^l + aj), for ^(0=0. (79) 



can be obtained making profit of the simple struc- 
ture of the Q2 matrix. We note that Q2 is in our 
two-point configuration a 6 dimensional matrix, 
where, as seen from its representation ()60|) . all of 



its elements are products and powers of 1 + (t| and 
1 -|- ^(r). The needed inverse matrix element can 
be written as ratio of determinants 



[Q2 ^](i,i)(i,i 



_ det(52 
) " det Q2 ' 



(80) 



where Q2 is the 5-dimensional matrix originating 
from Q2 where the row and column corresponding 
to the multiindex (1,1) have been taken out. Both 
of these determinants are therefore clearly polyno- 
mials in l+(Jg and l-l-^(r). The asymptotic behav- 
ior of the accessible information for large variance 
can be thus obtained by noticing that 77 — > for 
any value of ^. Looking then at the leading coef- 
ficients of the two polynomials entering (|80l) . we 
obtain 



detQ2^{<J^sY\^+ar)y 



det (52 



(81) 



Therefore, the information loss 



Fa 

tends to, for asymptotic values of the variance. 



(82) 



1 In^ (1 



''S (1 + C(r))^ 

<e = o) 



(83) 



The second line follows from the first using the ex- 
act result for vanishing correlations given in equa- 
tion ([TU]) . The efficiency of ^ in extracting the 
information on any parameter therefore always 
goes to zero, following approximately the inverse 
squared variance. The presence of substantial pos- 
itive correlations, generic for a field generated by 
gravitational instability on a wide range of scales, 
only makes the information loss worse. This is il- 
lustrated in figure [51 where the dotted line shows 
the loss of information at the non linearity scale 
^(r) = 1, evaluated numerically from equations 
d75|) and ([781), together with the exact result (|79)) 
for vanishing correlations (solid line). 

4. Summary and conclusion 

We have investigated in details the structure of 
the information within the moments of the uni- 
variate lognormal distribution, as a model for the 
matter density field. We have provided exact ex- 
pressions, equations (15^ and ([55)1 for the indepen- 
dent information content of each moment. Using 
these expressions, we have shown that the mo- 
ments become dramatically incomplete probes in 
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the non linear regime. In the cosmologicahy rel- 
evant case of the parameter entering the power 
spectrum of the fluctuations, the fraction of the 
information that is accessible from the moments is 
close to one fourth at = 1, and decays following 
the inverse 4th power of the variance. We showed 
it is mainly due to the inability of the moments to 
probe the information located in the underdense 
regions of the distribution. Besides, a compari- 
son to standard perturbation theory showed for 
the lower order moments showed that both ap- 
proaches are consistent and predict that the third 
order moment becomes as important as the vari- 
ance itself when entering the non linear regime. 

In a second step, we extended our results to the 
multivariate case, showing that the mean of the 
field and two-point correlations become in a very 
similar manner very inefficient probes, for any 
parameter of interest. With the help of two sim- 
plified situations we have shown that the presence 
of correlations only makes the information loss 
even worse. More specifically, we have shown that 
the extraction of the two-point correlation func- 
tion at any argument provides access to a fraction 
of the information which is generically well below 
unity at the entrance of the non linear regime, and 
decays like the inverse squared variance. 

These results, making clear that the informa- 
tion content of the lognormal field not only gets 
transferred to higher order point functions, but 
also becomes largely inaccessible to the correla- 
tion function hierarchy in the non linear regime, 
con firm to full extent qu alitative suspicions raised 
in ( Coles fc Joned 1991 , section 4), to which we 
refer for a more complete discussion on physical 
arguments that may source such a behavior. 

We can understand for any random field if the 
hierarchy members are promising probes or not 
from the following considerations. Let p[(f>, a] be 
the probability density function for a realization 
(j) of the field. As we have seen, the informa- 
tion content of the first A^-correlation functions is 
based on the approximation of the score function 
da lnp[(/)] through polynomials up to order iV, over 
the range of p[(j>]- Let us assume for instance that 
for any value of the model parameter lnp[0] can 
be expanded in a low order Taylor series in the 



field. 



N 

\Tip[(j), a] = / dxi ■ ■ ■ dxr, 



(84) 



A„ (xi, ■ • • , x„, a) (j){xi) ■ ■ ■ (f>{xn). 



This class of distributions, including Gaussian 
fields for which N = 2, can arise notably as max- 
imal entropy distributions for fixed values of the 
first iV-correlation functions. The coefficient A„ is 
called in this framework the potenti al associated 
to the n-th co rrelation function (See ( Javnesl[l983t 
Caticha 20081 e.g.)). Since da Inp is itself of poly- 
nomial of order N, all the information is contained 
is the first A^-correlation functions. Of course, the 
relative importance of each one of these will be 
modulated by the sensitivity of the potentials to 
the parameter a and their covariances. This sit- 
uation is certainly the one where the correlation 
function hierarchy are the probes of choice, since 
only a finite number of these grasp the entire in- 
formation content. 

Two different processes may at this point ren- 
der the hierarchy inefficient, or incomplete. First 
of all, if a large number of terms are needed in the 
expansion to reproduce accurately the score 
function da In p. In this case, one would need to 
go deep down the hierarchy in order to catch the 
information. This is certainly not desirable. The 
last case occurs when da Inj? has no Taylor ex- 
pansion at all over the relevant range. It is then 
simply not possible to represent accurately the 
score function. Parts of the information (given in 
the field analog of equation ([T5l) ) becomes invisi- 
ble to the correlation function hierarchy. The lack 
of a Taylor expansion for the logarithm function 
is the reason for the failure of the moments and 
correlation functions to catch the information of a 
lognormal field in the non linear regime, when the 
range of the probability density function becomes 
very large. 



We emphasize, as in (jColes fc JonesI 119911) . that 
these peculiar dynamics of the information are not 
due to a pathological character of the lognormal 
distribution. It should be expected for any distri- 
bution decaying slowly at infinity. We can add to 
their discussion that this is so because Inp can- 
not be well reproduced by polynomials under this 
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condition. 



Given the very high amphtude of these effects 
within the lognormal assumption, we beheve that 
in order to get the best out of future galaxy survey 
data, it is crucial to understand better these issues. 
The present work presents first steps to this aim. 
It is also very consistent and brings strong support 



to the recent studies started in (jNevrinck et al 



20091 bout IWang et al]|201l[ ) making in the non 
linear regime the logarithm of the field rather than 
the field itself the central quantity of interest. It 
remains however to be seen to what extent the 
approach presented in this work is able to provide 
quantitative predicitions for the statistical power 
of higher point functions, or for power spectrum 
extraction. We leave these aspects for future work. 



We would like to thank Adam Amara, Simon 
Lilly, Alexander Szalay and Mark Neyrinck for 
useful discussions, and acknowledge the support 
of the Swiss National Science Foundation. 



Fig. 5. — The loss of information e in extracting 
correlations of p instead of Inp for a lognormal 
field, equation (|82]) , in the limit of vanishing cor- 
relations (solid line), and at the non linearity scale 
^(r) = 1 (dashed) as function of the square root 
of the variance of the fluctuations. 
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A. Derivation of relation (|25p 

To prove (|25|) . we note that both sides of the equation are polynomials of degree n in t, and that the 
zeroes of the right hand side are given by 

t^q-\ i = 0,---,n-l. (Al) 

We first show that the left hand side evaluated at these points does vanish as well, so that the two polynomials 
must be proportional. We then find the constant of proportionality by requiring P„ to have the correct 
normalization. 

The first step is performed by noting that 

{Pniq-'x)) = —{P,,{x)x'), z = 0,l,--- (A2) 
rrii 

an identity which is proven by expanding P„ in both sides of the equation in terms of the transition matrix C, 
and using the relation (j22p between the moments. Since P„ is by construction orthogonal to any polynomial 
of strictly lower degree, we have indeed 

(P„(g-'x-)) =0, z = 0,---,n-l. (A3) 

This implies 

n 

^ Cnkruk t'' = a„ : q)^ (A4) 

fc=0 

for some constant of proportionality a„. To find it, we note that by expanding the normalization condition 
of P„, 

l = (n?(^)), (A5) 

using again property (1^^ . it must hold that 

n 

1 = ^ Cniruj Cnj-ruj q^'^. (A6) 



The sums can be performed using equation (IA4[) . leading to the following equation for a„, 

l = (-ira^g"("-i)/2(5-«:q)^^. (A7) 

This expression simplifies to 

n 

al = (A8) 

W l)n 

and the sign of a„ must be —1" in order to have a positive matrix element C„„. This concludes the proof 
of (ESI). 



B. Derivation of the representation ()42p 



In order to get the explicit series representation of (l42t . we first obtain from relation ((25)) the exact 
expression of the transition matrix C . The expansion of the g-Pochammer symbol on the right hand side of 
([25]) in powers of t is the Cauchy binomial theorem, 



fc=0 



n 
k 



q 



k(k-l)/2 



(Bl) 
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where 



(9 : Q)r 



g (9 : q)k (9 : <l)n-k 

is the Gaussian binomial coelficient. Matching powers of t in psp we obtain the explicit form 



Cnk = (-1) 



n-k Q 



'/2 



Therefore, interchanging the n and k sums in ()42|) , it holds 



fc 00 
X \ sr^ q" 



^ 1 - o" 



n=l A;=l ^ ' n=k 

With the help of some algebra the following identity is not difficult to show 



n—k 



1 



{q ■■q)k'^-q^'' 



k> 1. 



Consequently, the series expansion of 5^(2:) is given by 



(B2) 



(B3) 



(B4) 



(B5) 



(B6) 



C. Several variables 

We first need a little bit of notation. For a variable X taking values x — {xi, ■ ■ ■ Xd), we use the multindex 
notation 



"1 "^d 
n= =0, 1,-- - . 

d 

|n| := 

1=1 

where |n| is the order of the multiindex n. A moment of order N is given by 

m„ = (x") , |n| = N, 
and the covariance between the moments is 



(Cl) 



(C2) 



(C3) 



In this notation, the decomposition of the informa tion in independen t bits of order N proceeds by strict 
analogy with the one dimensional case. We refer to (jPunkl fc Xull2001l ) for the general theory of orthogonal 
polynomials in several variables. A main difference being that at a fixed order N there are not one but 

^ ~^ d ^) independent orthogonal polynomials, which are not uniquely defined. The orthogonality of 

the polynomials of same order is not essential for our purposes, but requiring the following condition is 
enough. 



{P^{x)P^{x)) = 0, |n| # |m| 
(Pn(x)P„.(a;)) = [i/„]„^, |n| 



(C4) 
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for some matrices The component of some function / parallel to the polynomial is 

Sn ■■= {f{x)Pn{x)) , (C5) 



and the expansion of / in terms of these polynomials reads, in the notation of (|Dunkl fc Xull200lL section 
3.5) 

N 

SM{x)^T. E ''^n[H-']^^PM- (C6) 

n=0 |n|,|m|=n 

It will converge to the actual function / for TV — oo if the set of polynomials is complete, whereas it may not 
if not. The expansion is also independent of the freedom there is in the choice of the orthogonal polynomials 
in equations (jC4[) . Writing the orthogonal polynomials in terms of a triangular transition matrix 

|m| < |n 

and taking / as the score function s{x,a), it is simple to see that the independent bits of information of 
order n are given by 



si 



|n|,|m|=n 
|iMj|<» 



and the strict analog of equation (ITT|) holds for each iV, 

N N 



"=1 |iMj| = l 
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